ATOM Documentation

← Back to App

Voice Wake Activation - Implementation Guide

Overview

Voice wake activation allows users to control ATOM agents using natural voice commands through the menu bar companion.

**Current Status**: Foundation Complete

  • ✅ Backend voice command processor
  • ✅ Frontend voice client
  • ✅ Voice feedback UI
  • ⏳ Native speech recognition (platform-specific)

---

Architecture

Components

  1. **Voice Command Processor** (Python)
  • Natural language intent classification
  • Command routing and execution
  • Agent integration
  1. **Voice Commands Client** (TypeScript)
  • Speech Recognition API wrapper
  • WebSocket communication
  • React hooks for UI integration
  1. **Voice Feedback UI** (React)
  • Listening state indicator
  • Transcript display
  • Audio waveform visualization
  1. **Native Integration** (Rust/Tauri)
  • Platform-specific speech recognition
  • Wake word detection
  • Audio capture

---

Current Implementation

Backend (Complete)

Voice Command Processor

from core.voice_command_processor import VoiceCommandProcessor

processor = VoiceCommandProcessor(db)
result = await processor.process_voice_command(
    tenant_id="tenant-123",
    agent_id="agent-456",
    command="Open dashboard",
    confidence=0.95
)

API Endpoint

POST /api/voice/command
Content-Type: application/json
Authorization: Bearer <token>

{
  "command": "Execute sales report",
  "confidence": 0.92,
  "agent_id": "agent-456"
}

Frontend (Complete)

Voice Command Client

import { useVoiceCommands } from '@/lib/voice-commands';

function VoiceControl() {
  const { isListening, transcript, startListening, sendCommand } =
    useVoiceCommands('tenant-123');

  return (
    <>
      <button onClick={startListening}>Start Listening</button>
      {isListening && <p>Listening...</p>}
      {transcript && <p>You said: {transcript}</p>}
    </>
  );
}

Voice Feedback Component

import { VoiceFeedback } from '@/components/menu-bar/VoiceFeedback';

<VoiceFeedback
  isListening={true}
  transcript="Open dashboard"
  response="Opening dashboard now"
/>

---

Platform-Specific Implementation

Native Speech Recognition

// src-tauri/src/voice_wake_macos.rs
use cocoa::base::id;
use cocoa::foundation::NSString;

pub struct MacOSVoiceWake {
    recognizer: id, // SFSpeechRecognizer
    is_listening: bool,
}

impl MacOSVoiceWake {
    pub fn new(wake_word: &str) -> Self {
        // Initialize SFSpeechRecognizer
        // Request microphone permissions
        // Configure for continuous listening
        Self {
            recognizer: create_speech_recognizer(),
            is_listening: false,
        }
    }

    pub fn start_listening(&mut self) {
        // Start real-time speech recognition
        // Detect wake word in audio stream
    }
}

Permissions

<!-- src-tauri/tauri.conf.json -->
<macOSPrivatePermissions>
  <key>microphone</key>
  <key>speech-recognition</key>
</macOSPrivatePermissions>

Windows (Future)

Windows Speech Platform

// src-tauri/src/voice_wake_windows.rs
use windows::Win32::Media::Speech::*;

pub struct WindowsVoiceWake {
    recognizer: ISpeechRecognizer,
}

impl WindowsVoiceWake {
    pub fn new(wake_word: &str) -> Self {
        // Initialize Windows Speech Platform
        Self {
            recognizer: create_speech_recognizer(),
        }
    }
}

Linux (Future)

Pocketsphinx Integration

// src-tauri/src/voice_wake_linux.rs
use pocketsphinx::*

pub struct LinuxVoiceWake {
    decoder: ps_decoder_t,
}

impl LinuxVoiceWake {
    pub fn new(wake_word: &str) -> Self {
        // Initialize Pocketsphinx
        // Load acoustic model
        Self {
            decoder: create_decoder(),
        }
    }
}

---

Supported Commands

Command Types

1. Agent Execution

"Hey ATOM, execute reconcile inventory"
"ATOM, run the sales report"
"Send a summary email to John"

2. Dashboard Navigation

"Open my dashboard"
"Go to dashboard"
"Show me the dashboard"

3. Status Queries

"Check agent health"
"What's the status?"
"How are the agents doing?"

4. Quick Actions

"Pause all agents"
"Enable autonomous mode"
"Show recent interventions"

---

Usage Flow

User Interaction

  1. **Wake Word Detection**
  1. **Command Listening**
  1. **Intent Classification**
  1. **Action Execution**
  1. **Feedback**

---

Configuration

Wake Word Settings

const voiceConfig = {
  wakeWord: "Hey ATOM",
  sensitivity: 0.7,        // 0-1, higher = more sensitive
  language: "en-US",
  continuous: false,        // Stop after one command
  requireConfirmation: true  // Ask for confirmation on sensitive actions
};

Command Settings

const commandConfig = {
  confidenceThreshold: 0.7,  // Minimum confidence to process
  timeout: 5000,              // Command timeout (ms)
  maxRetries: 3,              // Maximum recognition retries
};

---

Permissions & Security

Required Permissions

macOS

  • microphone - Audio capture
  • speech-recognition - Speech processing

Windows

  • Microphone access
  • Speech recognition access

Linux

  • Audio input device
  • Speech recognition library

Security Measures

  1. **Local Processing**
  • Wake word detected locally
  • No audio sent to cloud until wake word
  1. **Authentication**
  • All commands authenticated via JWT
  • Tenant-isolated command execution
  1. **Audit Logging**
  • All voice commands logged
  • Include transcript, timestamp, result
  1. **Confirmation Required**
  • Sensitive commands require confirmation
  • User can review before execution

---

Error Handling

Common Errors

Error: Speech Recognition Not Supported

if (!client.isSupported()) {
  showNotification(
    "Voice commands not supported in this browser",
    "error"
  );
}

Error: Low Confidence Recognition

if (confidence < 0.7) {
  return {
    success: false,
    message: "I didn't catch that. Could you repeat?",
    requires_confirmation: true
  };
}

Error: Microphone Access Denied

navigator.mediaDevices.getUserMedia({ audio: true })
  .catch(err => {
    showNotification(
      "Microphone access denied. Please enable in browser settings.",
      "error"
    );
  });

---

Testing

Manual Testing

  1. **Test Wake Word**
  1. **Test Commands**
  1. **Test Low Confidence**
  1. **Test Background Noise**

Automated Tests

// Test voice command classification
describe('Voice Command Processor', () => {
  it('should classify dashboard command', async () => {
    const result = await processor.process_voice_command(
      tenant_id,
      agent_id,
      "Open dashboard",
      0.95
    );
    expect(result.action).toBe('open_url');
  });

  it('should reject low confidence commands', async () => {
    const result = await processor.process_voice_command(
      tenant_id,
      agent_id,
      "Mumble mumble",
      0.5
    );
    expect(result.success).toBe(false);
  });
});

---

Performance

Latency Targets

OperationTargetActual
Wake word detection< 500msTBD
Speech to text< 1sTBD
Intent classification< 500msTBD
Command execution< 2sTBD
End-to-end< 3sTBD

Optimization Strategies

  1. **Local Wake Word Detection**
  • Reduce cloud dependency
  • Faster response time
  1. **Intent Caching**
  • Cache common command intents
  • Faster classification
  1. **Audio Processing**
  • Downsample audio for faster processing
  • Noise reduction for better accuracy

---

Future Enhancements

Phase 2 (Q2 2025)

  • [ ] Windows speech recognition
  • [ ] Linux speech recognition (Pocketsphinx)
  • [ ] Custom wake word training
  • [ ] Voice biometrics for security

Phase 3 (Q3 2025)

  • [ ] Multi-language support
  • [ ] Context-aware commands
  • [ ] Command history and replay
  • [ ] Voice shortcuts/macros

Phase 4 (Q4 2025)

  • [ ] Natural conversation mode
  • [ ] Follow-up questions
  • [ ] Multi-turn dialogues
  • [ ] Voice-activated workflows

---

Troubleshooting

Issue: Wake word not detected

**Solutions**:

  1. Check microphone permissions
  2. Adjust sensitivity setting
  3. Speak closer to microphone
  4. Reduce background noise

Issue: Commands misclassified

**Solutions**:

  1. Speak clearly and at moderate pace
  2. Use exact command phrases
  3. Check confidence score
  4. Review command logs

Issue: High error rate

**Solutions**:

  1. Train custom wake word model
  2. Adjust sensitivity threshold
  3. Improve microphone quality
  4. Reduce ambient noise

---

References

---

**Status**: Foundation Complete

**Platform Support**: macOS (ready), Windows (planned), Linux (planned)

**Production Ready**: With macOS native integration